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Abstract 


Despite its ub.qu.ty .n human learning, very little work ha* been done in artificial intelligence on 
agents that learn from .nteract.ve natural language instruction*. In this paper, we examine the problem 
nlrfor ™" 8 P roc «dures from .nteract.ve situated instruction, in which the student is attempting to 
perform tasks withm the instructional domain, and asks for instruction when it is needed. WepreLnt 
Instructo-Soar, a system that behaves and learns in response to interactive natural language instructions 

extend^k" IT" C ° f mp ete y "T Pr ° CedureS from ^“ences of instruction, and also learns how to 
extend its knowledge of previously known procedures to new situations. These learning tasks require 

SSr a T yUC '""'"V Instructo-Soar exhibits a multiple execution learning process in 
b. op.“ * C ' “ d '**" *“° W ll ” 


1 Introduction 

The hallmark of universal computation systems is their ability to take instructions. This ability separates 

comnnt ° f ^ wl ” ch can P erform onl y a limited number of tasks. Instructability allows 

thS is radicall Pe S m Tf "“f nUmber ° f However > computers take instructions in a way 

that radically different from the way humans do. Computers receive instructions in the form of programs 

propeTtS ° d ° f C ° mmunicatlon from mstructor (programmer) to computer is characterized by the following 

' frtffictjt, t“pTr.pd r syn™ m,!rS mUS ‘ tr “ S ' ate k “” l€d «* inl ° » '“8"^ ,hat re< . u ' res P'~i« 

* p^i“" d “* not occur " ,i ‘ hin the con,e ’ t ' ° f ihe compu ‘ er at - 

• Non-interactive instruction. The instructor determines when and what to instruct without any 

leedback from the computer. J 

These properties have a number of implications for the instructor: 

1 and conflicts. 01 mUSt kn ° W what pr ° Cedure8 are already encoded in the computer, to avoid redundancy 

2. The mstructor must understand the effects of long sequences of instruction, because a complete in- 

structional sequence must be generated. ” 

3. The instructor must create a procedure that applies in every situation it will be exposed to. 

4. The instructor must specify the procedures in complete detail; no steps may be omitted or abstracted. 


Pr ° C “ iin ” ° f th ‘ T '" tk National Conference, ed. Paul E. Utgoff, 
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In contrast, humans can engage in apprenticeship instruction, in which the student actively tries to 
acquire knowledge to aid in problem solving. This type of instruction has the following properties. 

" Natural language. Instructor and student speak the same language, and the language is highly 

• Situated instruction. The instructor and student are situated within a specific task. The instructor 
does not need to predict the effects of long instruction sequences because the student performs the 
t2k in response to individual instructions. The instructor needs to produce instructions for only the 
situation at hand, not for all possible situations. The student can use the situation to disambiguate 

instruction and cue the recall of relevant domain knowledge. . 

• Interactive instruction. The student can request help during a task. This frees the instructor from 

nr retails more instruction can be requested. . . 

’ a Tnctriirto-Soar a system that learns procedures from interactive natural lan- 

r^oJnot know how to apply in the current situation, and for completely new complex operators 
comprehension. During later executions of the task, analytic techniques generalize the procedure. 


2 Related Work 

^'^here^hav^een^vartety of sytAeii^A^learo tom ZI^Lmond, 1987; Wilkins 

1987, Laird J ■ ^ t However the instructor can only select actions that 

typically, the system sugges s eps ‘ problem state LAS’s cannot take instructions specifying new, 

r^me^ 
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Pick up the red block. 

Move to the yellow table. 

Move the arm above the red block. 

Move up. 

Move down. 

Close the hand. 

Move up. 

The operator is finished. 

(bj ~ 


Figure 1: (a). Initial situation of agent; (b) Instructions to teach operator. 


3 Instruction within an autonomous agent 

One factor that most previous work on instruction has ignored is the integration of learning from instruction 
within an autonomous agent. To learn from interactive instruction, an autonomous agent must have general 
reasoning capabilities, and be able to recognize when its knowledge is insufficient and instruction is needed. 
Instructions must be assimilated into a possibly large body of existing knowledge, and instructional learning 
must be smoothly integrated with the agent’s other learning and problem solving methods. An agent must 
maintain its ability to respond to its environment even while accepting instructions, and must be able to 
apply learning from instruction to a wide variety of tasks. 

Supporting these capabilities is dependent in part upon the architecture in which the agent in constructed. 
We use Soar [Laird el al, 1987] as our underlying architecture. Soar’s basic structure provides a framework 
in which these capabilities can be approached. 

In Soar, all activity occurs by applying operators to states within problem spaces, supporting general 
problem solving and planning. Our instruction learning techniques learn and extend operators, and thus have 
the potential to be applicable to any problem encoded in Soar. When a Soar agent cannot make progress 
within a problem space, an impasse arises, and a subgoal is generated to resolve the impasse. Any type of 
knowledge might be applied within the subgoal, so learning from instruction can co-exist with learning from 
other knowledge sources, such as experimentation, analogy, etc. Learning occurs through chunking, a form 
of explanation-based learning, which summarizes the results of subgoal processing, avoiding the impasse in 
the future. Chunking occurs over all subgoals, so instructional learning can be performed as part of the 
ongoing activity of the agent, rather than using a separate mechanism that interrupts the normal course of 
activity. 

Instructo-Soar ’s problem spaces implement an agent with three main categories of knowledge: natural 
language processing knowledge, originally developed for NL-Soar [Lehman et al , 1991]; knowledge about 
obtaining and using instruction; and knowledge of the task domain itself. This task knowledge is extended 
through learning from instruction. Assumed characteristics of the Instructo-Soar agent include: 

1. Relevant relationships. The agent has knowledge of all of the relevant task relationships, and can 
derive them from perception. 

2. Primitive operators. The agent knows a set of primitive operators that it can execute, internally 
simulate, and map natural language to. 

3. Reading ability. The agent can read the instructions, even if it has no knowledge of an operation 
within the current task domain that corresponds to the instruction. 

4. Locality of instruction. The agent assumes that an instruction applies to the most recent unachieved 
operation. 

4 Learning from instruction 

Consider the agent and situation shown in Figure 1(a). The agent has primitive operators for moving to 
tables, opening and closing its hand, and moving its arm up, down, and into relationships with objects (e.g., 
above blocks). This is the primary domain Instructo-Soar has been applied to; the techniques have also been 
applied in a more limited way to a flight domain, in which Soar controls a flight simulator and instructions 
are given for simple maneuvers like taking off [Pearson ei al., 1993]. To explain Instructo-Soar’s performance, 
we will use the example of teaching the agent in Figure 1(a) to pick up blocks. Since picking up blocks is 
not a known operator, when told “Pick up the red block,” the agent must learn a new operator. 

To learn a new operator, an agent must learn each of the following: 
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1. Mapping from natural language: What instruction(s) map onto the new operator. The mapping 
allows the agent to select the operator when commanded in the future. 

2 Operator template: Knowledge of the operator’s arguments and how they can be instantiated. For 
' picking up blocks, the agent acquires a new operator with a single argument, which may be instantiated 

with any block that isn’t currently picked up. ..... 

3 Implementation: How to perform the operator. New operators are built from primitive and/or 
previously learned operators, so implementation takes the form of a series of sub-operators (e.g., move 
to the proper table, grasp the block, etc.) 

4. Termination: Knowledge of when the new operator is achieved. This is the goal concept of the new 
operator. For “pick up”, the termination conditions include holding the desired block, with the arm 


U P* , 

Instructo-Soar handles simple imperative sentences, and learns a straightforward mapping of an instruc- 
tion’s semantic argument structure to a newly generated operator template. In general mapping from 
instructions to task operators and objects can be difficult as it can require complex natural language com- 
prehension, and possibly reasoning about the task itself [Huffman and Laird, 1992, DiEugenio, 199 ]. 

To learn a general operator implementation from instructions, an agent must determine the proper scope 
of applicability of each instruction. Some features of the current task and situation are important condaions 
while others may be ignored. For example, when told to pick up a red block, does it matter that the block 
is red? Perhaps, if building a stoplight. But if trying to block open a door, the key feature * may be the 
block’s weight. Thus, learning from instruction can involve both generalization (that red doesn t matter, 
although explicitly mentioned) and specialization (that the weight matters, although not mentioned). 

To learn general implementations, Instructo-Soar uses explanation-based learning (EBL) as realized by 
chunking in Soar. Proper generalization requires understanding how each instruction contributes to achieving 
the goal However, during the initial execution of the instructions for a new operator, the agent does not 
know the termination conditions (goal concept) of the operator; therefore, generalization on this basis is 
impossible. Thus, initial learning is based on a weak inductive step: believing what the instructor says. 
This learning is rote and overspecific, with an “episodic” flavor. At the end of the initial execution, the 
termination conditions, or goal concept, of the new operator can be induced. On later executions, the agen 
can form an understanding of how the instructions, recalled from its episodic memory, allow the goal to be 
reached, using its knowledge of primitive operator effects (domain theory). This allows the implementation 
sequence to be learned deductively (and generally), based on achievement of the induced goal concept. We 
will describe the details of the technique using an example. 


5 Example 

Consider the agent shown in Figure 1(a) being instructed to pick up blocks. The agent is given the instructions 
shown in Figure 1(b) during the course of performing the task. 


5.1 First execution 

The agent begins with knowledge of the primitive operators, but no knowledge of the new operator it will 
be instructed to perform. Following the first execution, the agent must be able to perform at least the exact 
same task without being re-instructed. Thus, the agent must learn, in some form, all of the parts of the new 

operator, as described in the previous section. , . i ftI1<fll9flp 

The first instruction given is “Pick up the red block.” It is comprehended using Soar s natural language 
capability, NL-Soar [Lehman et al., 1991], which produces a semantic structure and resolves the red block 
to a block in the agent’s environment. However, the semantic structure produced does not correspond to 
any known operator, indicating that the agent must learn a new operator. Thus, a new, empty operator is 
generated (e g., new-o P 14), with an argument structure that directly corresponds to the semantic structure s 
arguments (here , one argument, object). The system learns a mapping from the semantic structure to the 
new operator, heuristically restricting arguments to be of the same type (e.g., isa block) as in the current 


instruction. 

Next, the new operator is selected for execution, 
the operator, it immediately impasses and asks for 


Since the agent doesn’t know any implementation for 
further instructions. Each instruction in Figure 1(b) 


210 


is given, comprehended and executed in turn. For instance, “Move to the yellow table” is comprehended 

mapped to a known operator, and executed. These instructions provide the implementation for the new 
operator. 

The instruction “Move the arm above the red block” provides an example of learning to achieve the 
preconditions of a known operator. This operation is known, and the agent can perform it when its hand 
is m the upper plane. However, here the hand is in the lower plane, so the operation cannot be performed 
directly, Ihus, the agent asks for instruction about this operator, and is told to move the arm up This 
achieves the precondition, and after moving up the agent has sufficient knowledge to complete the move above 
operation without further instruction. As a result, a rule is learned that will propose moving up to achieve 
this precondition m the future. Thus, a known operation has been extended to apply in a new situation; 
further instruction could extend it even further, for instance allowing the agent to “move above” starting 
from a state where it’s not even next to the table. Note that the interactive nature of instruction means 
hat the instructor need not know beforehand whether the agent knows how to apply an operation from the 
current situation This recursive instruction of sub-operations could be multiple levels deep, allowing for a 
great flexibility of instruction sequences, depending what the agent already knows. A simple mathematical 
analysis shows that for a sequence of only six primitive actions, with preconditions for each action over 100 
possible sequences of interactive instruction are possible [Huffman, 1992], This contrasts with learning from 

o servation, m which systems learn from observing only the sequence of primitive operations performed to 
carry out the task. 

After ^completing the "move above” action, the agent continues receiving and executing instructions for 
the new pick up operator. Ultimately, the implementation sequence for “pick up” will be learned at the 
proper level of generality, based on understanding how each instructed operator leads to successful execution 
of the new operator. During the initial execution, however, this is impossible, because the goal of this new 

operator is not yet known. Thus, the agent resorts to rote learning, recording exactly what it was told to 
do, m exactly what situation. 

This recording process is not an explicit memorization step; rather, it occurs as a side effect of language 
comprehension. While reading each sentence, the agent learns a set of rules that encode the sentence’s 
semantic features The rules help NL-Soar to resolve referents in later sentences (implementing a simple 
version of Grosz s focus space mechanism [Grosz, 1977]). The rules record each instruction, indexed by the 
goal it applies to and its place in the instruction sequence. This episodic “case” corresponds to a lock-step 
overspecific sequencing of the instructions given to perform the new operator. For instance, the agent encodes 
that to pick-up (that is, new-opl4) the red block, rbl, I was first told to move to the yellow table ytl.” 
One issue that arises here is whether and when to generalize the index and information contained within 
the case. However, at this point any generalization would be purely heuristic, since the agent was unable to 
explain the various steps of the episode. 

Finally, the agent is told “The operator is finished,” indicating that the goal of the new operator has 
been achieved. This triggers the agent to learn termination conditions for the new operator. Learning 
termination conditions is an inductive concept formation problem. Standard concept learning approaches 
may be used here; however, typically, an instructor will expect learning within a small number of examples. 
Currently, the system uses a simple heuristic: it compares the current state to the initial state the agent 
was m when commanded to perform the new operator. Everything that has changed is considered a part of 
the termination conditions of the new operator. In this case, the changes are that the robot is standing at 
a table, holding a block, and the block and gripper are both up in the air. 

This heuristic forms the system’s inductive bias for learning termination conditions. It allows learning 
from a single example, but is clearly too simple. Conditions that changed may not matter; e.g., perhaps it 
doesn t matter to picking up blocks that the robot ends up at a table. Unchanged conditions may matter; 
e.g., if learning to build a stoplight”, block colors are important. 

.J n t rU r°[ Soar P er forms this induction by EBL over an overgeneral theory (as, e.g., [Miller and Laird, 
199i; VanLehn et ai, 1990; Rosenbloom and Aasman, 1990]). Although not sophisticated here, this type of 
inductive learning has the advantage that the agent can alter the bias to reflect other available knowledge, 
lhis might include more instruction (e.g., simply asking which features are relevant [Laird et al 19901)- 
analogy to other known operators (e.g, pick up actions in related domains), domain heuristics, etc’ 

On the first pass, then, the agent: 

• Carries out a sequence of instructions achieving a new operator. 

• Learns a new operator template. 
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• Learns the mapping from natural language to the new operator. 

• Learns a rote execution sequence for the new operator. 

• Induces the termination conditions of the new operator. ... . , , , , 

Since the agent has learned all of the necessary parts of an operator, it will be able to perform the same task 
induction. However, since the implementor, of the operator ,. rote, ,t can only perform 

fhf* exact same task It has not learned generally how to pick up blocks yet. 

s“« the goal is now known, the system coaid explain and generalise the mstruet.on sequence d.recUy 
aflerT first execution. This is a reasonable possibility, but the mult, -top s.mulat.on requ.red has two 

disadvantages: 

1 The agent’s ongoing performance of the tasks at hand (either by acting or by taking more instruction) 
is temporarily Lpfnded. This could be awkward if instruction of the new procedure befng s.mulated 
is nested within the instructions for larger tasks that must still be completed, or if these tasks have 

temporal constraints. 

2 The multiple step simulation is susceptible to compounding of domain theory errors. T ^ at * 
significant error in simulating any step of th. procedure (or the iuturucf.on * 

can lead to an incomplete or incorrect explanation of goal achievement. Simulating to explain each 
individual instruction, as described below, avoids this problem because each success. ve simulation 
begims from the current external state, which reflects the true effects of the previous instructions. 

Thus, we have opted for generalizing on future executions. 

5.2 Later executions 

Later in the same situation the agent is again asked to pick up the red block. The agent selects the newly 
learned operator, and then reaches an impasse because it does not yet know the general impleme 
sequence for the operator (how to pick up blocks in the general case). Here the agent attempts to wall 
instructions it was given during the first execution. It retrieves, instruction by instruction, the rote case 

learned previously^ 1 carry , n g ou t the instruction in the external world, the agent attempts to 

explain to itself why the instructed operator leads to achievement of the higher-level goal of picking up t e 
Moc^This explanation attempt takes the form of an internal simulation. Starting from the current world 
state the aeent^tern^ly simulates the recalled operator. Thus, the siiuatedness of the instruction plays a 
key rol^m the learning process, because the current situation grounds the explanation. The agent continues 
£ operators until it either reaches its higher-level goal (internally) or does not know what to do 
next. If the goal is reached, the path taken to the goal comprises an explanation of how the recalled operat 

^^^this^xplInTtlon, the system learns a general rule that proposes the recalled ^operator under the 
right conditions P The rule both generalizes and specializes the original instruction In move to the ye 
tabllTca^The color of the table is generalized away, because it was not critical for achievement o he 
S3, while the fact that the table has the block to be picked up on it is included m the proposal 

C ° n lft«?earning the complete general implementation, the agent will perform the task without reference to 
, if , , t “Pick UD the green block,” nes-opl4 is selected and instantiated with the green block 

fhfbloI) g imp" eZntZnZd To be performed from the same initial state each time, 

and its steps were not conditional on the state. 


6 Results 


In the robotic domain described earlier, 
shown in Figure 2. The system read 24 


Instructo-Soar has been applied to learn a hierarchy of task operators, 
natural language instructions (14 unique sentences) and learned 1357 
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jjnovc left of[blo€k l,block2) ^ 

|pidkup(b^) 

y move to table (tb!2)~j »- [ move arm left (b<ocK2) — »j 


^ 

| put down (blodrt) t -^ fgLp (block)| -J 


move arm up 


I 


move arm up move arm down 


open gripper 


move to table (table) move above (block) 






move arm up 


move arm down M close gripper 


Barron 

Si 

operator template 

u 

operator proposal 

20 

operator termination 8 

nl mapping 

8 

instruction recall 

12 

segment recall 

8 

referent resolution 

770 

natural langauge 

481 

other 

46 


£"„ed buST ^ Primitive operators M e ,h„w„ i„ light print; 



Figure 3: Decision cycles vs. learning trial for executing “pick up”. 


to achieve^^omtV ** S f hown , In tbe figure ' 11 learned four completely new operators (shown in bold), how 
“nirflto” f f ° r . 4 kn ? Wn j ° perator f mOV6 above )> and the extension of a new operator (extending 

P P f 7. t thC r ? b0t already 1S ho,dln S a bIock after initially learning “pick up” when the gripper 
was empty)- This hierarchy can be taught using many different instruction orderings. For instance new 

feT^ick H / PP r r ^ ! Ub t Perat r° r u (e g ’’ gra8I>) Can te UUght either before teachin « bi 8 her operators 
lew n P \ UP T°J d ?u ng teachlng of them - If durin «> the agent recursively requests instruction for the lower 

it ThkU Thustheln ® tructor ne « d not know whether the agent knows a procedure before commanding 
it. This is an important advantage of interactive instruction for autonomous agents, which may have largf 
amounts of knowledge. Similarly, the instructor may suggest an action that has unmet preconditions (thus 

tZZTfn: ^ 1 ?5 rUCtl ° r ! SCqUenCe) ’ as8um * n 8 a gent knows how to achieve them before performing 
h Instmct^ he age ^ doeS not> J can /equest more instruction, and learn how to achieve the preconditions. 
instructo-Soar exhibits a number of interesting learning characteristics: 

* * trategi ® s - fnstructo-Soar has two strategies it can use in recalling past instructions, 
onlr , a " S and mternally ^simulating an instructed operator, the agent still may not know how that 
operator leads to the goal. At this point, the agent may terminate its internal simulation, and carry 

:t h iV eCa,led ° Pe r r m thC eXterna ‘ WOrld ' This is a Sm ° lt reca H strategy, which is appropriate 
when the agent is under pressure to act quickly. Alternatively, the agent may attempt to recall further 

instructions simulating each ,n turn, until the higher-level goal is reached. This is a multiple recall 
strategy, which leads to faster learning, but is more susceptible to errors in the agent’s domafn theory 
(primitive operator knowledge), as described above. y 

• Bottom-up learning (single recall strategy). Limiting recall to a single step allows only a single sub- 
operator per execution to be generalized. Generalized learning begins at the end of the implementation 
sequence and moves towards the beginning. As Figure 3 shows for learning “pick up”, the resulting 
learning curve closely approximates the power law of practice [Rosenbloom and Newell 1986] (r = 0 98) 
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• Effectiveness of hierarchical instruction (single recall strategy). Due to the bottom-up effect, the 

^ "h'wTng'th.t sub'ectf given procedural instruction. learn and perform better when they h,v« 
a domain model [Kieras and Bovair, 1984]. 

7 Conclusion 

generalized by using internal s 1 ™ 1 ** 1 ™ ‘^erof directions. In addition to positive imperative sentences, we 
areluteX inTitTgatinglaming from other types ^ h T!m^nc^-Z^ 

=M 

the induced conditions. Finally , allowing wea er ° r ™® ° P , L ^ k 1989]) would lead to more graded 
theories (e.g., [Pazzani, 1991; VanLehn ei ai, 1992; Schank and might al ® lead 

" unknown affects of pnmitive 

actions. 
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